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Abstract 

We present a new method for analyzing the running time of parallel evolutionary al- 
gorithms with spatially structured populations. Based on the fitness-level method, it 
yields upper bounds on the expected parallel running time. This allows to rigorously 
estimate the speedup gained by parallelization. Tailored results are given for common 
migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. 
Example applications for pseudo-Boolean optimization show that our method is easy 
to apply and that it gives powerful results. In our examples the possible speedup in- 
creases with the density of the topology. Surprisingly, even sparse topologies like ring 
graphs lead to a significant speedup for many functions while not increasing the to- 
tal number of fimction evaluations by more than a constant factor. We also identify 
which number of processors yield asymptotically optimal speedups, thus giving hints 
on how to parametrize parallel evolutionary algorithms. 
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1 Introduction 

Due to the increasing number of CPU cores, exploiting possible speedups by parallel 
computations is nowadays more important than ever. Parallel evolutionary algorithms 
(EAs) form a popular class of heuristics with many applications to computationally 



expensive problems |23 25 38 1. This includes island models, also called distributed EAs, 
multi-deme EAs or coarse-grained EAs. Evolution is parallelized by evolving subpopu- 
lations, called islands, on different processors. Individuals are periodically exchanged 
in a process called migration, where selected individuals, or copies of these, are sent 
to other islands, according to a migration topology that determines which islands are 
neighboring. Also more fine-grained models are known, where neighboring subpopu- 
lations communicate in every generation, first and foremost in cellular EAs |38|. 

By restricting the flow of information through spatial structures and / or infrequent 
communication, diversity in the whole system is increased. Researchers and practition- 
ers frequently report that parallel EAs speed up the computation time, and at the same 
time lead to a better solution quality 1 23] . 

Despite these successes, a long history |4| and very active research in this 
area |l2j 23 3l|, the theoretical foundation of parallel EAs is still in its infancy. The im- 



pact of even the most basic parameters on performance is not well understood |33J . 
*A preliminary version of this paper with parts of the results was published at PPSN 2010 |19|. 



Past and present research is mostly empirical, and a solid theoretical foundation is 
missing. Theoretical studies are mostly limited to artificial settings. In the study of 
takeover times, one asks how long it takes for a single optimum to spread throughout 
the whole parallel EA, if the EA uses only selection and migration, but neither muta- 
tion nor crossover 1301 31 1. This gives a useful indicator for the speed at which com- 



munication is spread, but it does not give any formal results about the running time of 
evolutionary algorithms with mutation and / or crossover. 

One way of gaining insight into the capabilities and limitations of parallel EAs is 
by means of rigorous running time analysis 1 39 1 . By asymptotic bounds on the running 
time we can compare different implementations of parallel EAs and assess the speedup 
gained by parallelization in a rigorous manner. 

In flS] the authors presented the first running time analysis of a parallel evo- 
lutionary algorithm with a non-trivial migration topology. It was demonstrated for 
a constructed problem that migration is essential in the following way. A suitably 
parametrized island model with migration has a polynomial running time while the 
same model without migration as well as comparable panmictic populations need ex- 
ponential time, with overwhelming probability. Neumann, Oliveto, Rudolph, and Sud- 
holt 1 26 1 presented a similar result for island models using crossover. If islands perform 
crossover with immigrants during migration, this can drastically speed up optimiza- 
tion. This was demonstrated for a pseudo-Boolean example as well as for instances of 
the VertexCOVER problem |26|. 

In this work we take a broader view and consider the speedup gained by paral- 
lelization for various common pseudo-Boolean functions and function classes of vary- 
ing difficulty. A general method is presented for proving upper bounds on the parallel 
running time of parallel EAs. The latter is defined as the number of generations of the 
parallel EA until a global optimum is found for the first time. This allows us to estimate 
the speedup gained by parallelization, defined as the ratio of the expected parallel run- 
ning time of an island model and the expected running time for a single island. It also 
can be used to determine how to choose the number of islands such that the parallel 
running time is reduced as much as possible, while still maintaining an asymptotically 
optimal speedup. 

Our method is based on the fitness-level method or method of f -based partitions, a 
simple and well-known tool for the analysis of evolutionary algorithms |39|. The main 
idea of this method is to divide the search space into sets Ai, . . . , Am, strictly ordered 
according to fitness values of elements therein. Elitists EAs, i. e., EAs where the best 
fitness value in the population can never decrease, can only increase their current best 
fitness. If, for each set Ai we know a lower bound Sj on the probability that an elitist EA 
finds an improvement, i. e., for finding a new search point in a new best fitness-level set 
Ai+i U • • • U Am, this gives rise to an upper bound J^iLi ^/^i on the expected running 
time. The method is described in more detail in Section|2] 

In Section|3]we first derive a general upper bound for parallel EAs, based on fitness 
levels. Our general method is then tailored towards different spatial structures often 
used in fine-grained or cellular evolutionary algorithms and parallel architectures in 
general: ring graphs (Theorem |4] in Section |4]|, torus graphs (Theore m [6| in Section |5}, 
hypercubes (Theorem |8] in Section |6} and complete graphs (Theorems 10 and 12 in Sec- 
tion |7). 

The only assumption made is that islands run elitist algorithms, and that in each 
generation each island has a chance of transmitting individuals from its best current fit- 
ness level to each neighboring island, independently with probability at least p. We call 
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the latter the transmission probability. It can be used to model various stochastic effects 
such as disruptive variation operators, the impact of selection operators, probabilistic 
migration, probabilistic emigration and immigration policies, and transient faults in 
the network. This renders our method widely applicable to a broad range of settings. 

1.1 Main Results 

Our estimates of parallel running times from Theorems |4} |6} |8} [lOj and [12] are summa- 
rized in the following theorem, hence characterizing our main results. Throughout this 
work /I always denotes the number of islands. 

Theorem 1. Consider an island model with ^ islands where each island runs an elitist EA. For 
each island let there be a fitness-based partition Ai, . . . , such that for all 1 < i < m all 
points in Ai have a strictly worse fitness than all points in A^+i, and Am contains all global 
optima. We say that an island is in Ai if the best search point on the island is in Ai. Let Si be a 
lower hound for the probability that in one generation a fixed island in Ai finds a search point 
in A^+i U • • • U Am. 

Further assume that for each edge in the migration topology in every iteration there is a 
probability of at least p that the following holds, independently from other edges and for all 1 < 
i < m. If the source island is in Ai then after the generation the target island is in AiU- ■ - UAm- 
Then the expected parallel running time of the island model is bounded by 



2- T72 ) j: for every ring graph or any other strongly connectec.^ 

topology, 



2. O^^^ X]"=i^ T73^ + ^ X^I^li^ for every undirected grid or torus graph with side 
lengths at least y/Jl x y/JI, 

3. Q^ "''°s(M)+II]i=i logli/ji.) ^ _j_ I J"/"'' (log i.i)-dimensional hypercube graph, 

4. 0{m/p) + ^ Y!h=i^ 7~fr^ complete topology K^, as well as 

"^\^'" + min{pp,l} ) ^ ^^ 2^i=l s; • 

A remarkable feature of our method is that it can automatically transfer upper 
bounds for panmictic EAs to parallel versions thereof. The only requirement is that 
bounds on panmictic EAs have been derived using the fitness-level method, and that 
the partition Ai, . . . , Am and the probabilities for improvements si, . . . , Sm~i used 
therein are known. Then the expected parallel time of the corresponding island model 
can be estimated for all mentioned topologies simply by plugging the s; into Theo- 
rem [T| Fortunately, many published runtime analyses use the fitness-level method — 
either explicitly or implicitly — and the mentioned details are often stated or easy to 
derive. Hence even researchers with limited expertise in runtime analysis can easily 
reuse previous analyses to study parallel EAs. 

Further note that we can easily determine which choice of n, the number of is- 
lands, will give an upper bound of order • J27=i^ ^/^i — hesi upper bound we 
can hope for, using the fitness-level method. In all bounds from Theorem [T] we have 
a first term that varies with the topology and p, and a second term that is always 



^ A directed graph is strongly connected if for each pair of vertices u, v there is a directed path from utov 
and vice versa. 
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• J27=i^ l/sj- The first term reflects how quickly information about good fitness 
levels is spread throughout the island model. Choosing /i such that the second term 
becomes asymptotically as large as the first one, or larger, we get an upper bound of 

■ Si=7^ ^/sij. For settings where X]i=7^ asymptotically tight upper 

bound for a single island, this corresponds to an asymptotic linear speedup. The max- 
imum feasible value for depends on the problem, the topology and the transmission 
probability p. 
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Table 1: Asymptotic bounds on expected parallel (Tp^"^, number of generations) and 
sequential {T^'^^, number of function evaluations) running times and expected commu- 
nication efforts (T'^"™, total number of migrated individuals) for various n-bit functions 
and island models with fi islands running the (1+1) EA and using migration probabil- 
ity p = 1. The number of islands ji was always chosen to give the best possible upper 
bound on the parallel running time, while not increasing the upper bound on the se- 
quential running time by more than a constant factor. For unimodal functions d + 1 
denotes the number of function values. See |7| for bounds for the (1+1) EA. Results 
for Jump^ were restricted to 3 < fc = 0(n/ logn) for simplicity. All upper bounds for 
OneMax and LO stated here are asymptotically tight, as follows from general results 
inl,35J. 

We give simple examples that demonstrate how our method can be applied. Our 
examples are from pseudo-Boolean optimization, but the method works in any setting 
where the fitness-level method is applicable. The simple (1+1) EA is used on each is- 
land (see Section |2] for details). Table [T] summarizes the resulting running time bounds 
for the considered algorithms and problem classes. For simplicity we assume p ~ 1; a 
more detailed table for general transmission probabilities is presented in the appendix, 
see Table |2] The number of islands /i was chosen as explained above: to give the small- 
est possible parallel running time, while not increasing the sequential time, asymp- 
totically The table also shows the expected communication effort, defined as the total 
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number of individuals migrated throughout the run. This quantity is proportional to 
the parallel expected running time, with a factor depending on the number of islands 
and the topology. Details are given in Theorems |4) [6| [8) [T0| and 12 The functions used 



in this table are explained in Section|2] Table|2]ui the appendix shows all our results for 
a variable number of islands /i and variable transmission probabilities p. 

The method has already found a number of applications and it spawned a number 
of follow-up papers. After the preliminary version of this work |19 | was presented, the 
authors applied it for various problems from combinatorial optimization: the sorting 
problem (as maximizing sortedness), finding shortest paths in graphs, and Eulerian cy- 
cles pT| . Very recently, Mambrini, Sudholt, and Yao |24| also used it for studying how 
quickly island models find good approximations for the NP-hard SetCOVER problem. 
This work has also led to the discovery of simple adaptive schemes for changing the 
number of islands d3mamically throughout the run, see Lassig and Sudholt |20|. These 
schemes lead to near-optimal parallel running times, while asymptotically not increas- 
ing the sequential running time on many examples 1 20 1 . These schemes are tailored to- 
wards island models with complete topologies, which includes offspring populations 
as special case. The study of offspring populations in comma strategies is another re- 
cent development that was inspired by this work [ ,29] . 

2 Preliminaries 

In our example applications we consider the maximization of a pseudo-Boolean func- 
tion /: {0, 1}" — > R. It is easy to adapt the method for minimization. The number 
of bits is always denoted by n. The following well known example functions have 
been chosen because they exhibit different probabilities for finding improvements in 
a typical run of an EA. For a search point x E {0, 1}" write x = xi . . . Xn, then 
C)neMax(a;) := Y^^=i counts the number of ones in x and LO(.t) := X]"=i YVj=i ■'^i 
counts the number of leading ones in x, i. e., the length of the longest prefix containing 
only 1-bits. A function is called unimodal if every non-optimal search point has a Ham- 
ming neighbor (i. e., a point with Hamming distance 1 to it) with strictly larger fitness. 
Observe that LO is unimodal as flipping the first 0-bit results in a fitness increase. For 
LO every non-optimal point has exactly one Hamming neighbor with a better fitness. 
For 1 < k < n we also consider 



Jump^ 



k 



+ J27=i if J2i=i < n ~ k ov X ^ V 
^"^-^(1 — Xi) otherwise. 



This function has been introduced by Droste, Jansen, and Wegener f7| as a function 
with tunable difficulty as evolutionary algorithms typically have to perform a jump to 
overcome a gap by flipping k specific bits. It is also interesting because it is one of very 
few examples where crossover has been proven to be essential |14 17]. 

We are interested in the following performance measures. First we define the par- 
allel running time T^^^' as the number of generations until the first global optimum is 
evaluated. The sequential running time T^^^ is defined as the number of function eval- 
uations until the first global optimum is evaluated. It thus captures the overall effort 
across all processors. In both measures we allow ourselves to neglect the cost of the 
initialization as this only adds a fixed term to the running times. 

The speedup is defined as the ratio of the expected running time of a single island 
and the expected running time of a parallel EA with /i islands. This corresponds to 
the notion of a weak orthodox speedup in Alba's taxonomy If the speedup is at least 
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of order /i, i. e., if it is ri(/i), we speak of a linear speedup. In this work it is generally 
understood in an asymptotic sense, unless we call it a perfect linear speedup. 

We also define the communication ejfort y^o'" as the total number of individuals 
migrated to other islands during the course of a run. Depending on the parallel ar- 
chitecture, communication between processors can be expensive in terms of time and 
bandwidth used. Therefore, this measure can be an important factor for determining 
the performance of a parallel EA. 

Our method for proving upper bounds is based on the fitness-level method f7lj39l. 
The idea is to partition the search space into sets Ai, . . . , Am called fitness levels that are 
ordered with respect to fitness values. We say that an algorithm is in Ai or on level i if 
the current best individual in the population is in Ai . An evolutionary algorithm where 
the best fitness value in the population can never decrease (called an elitist EA) can only 
improve the current fitness level. If one can derive lower bounds on the probability of 
leaving a specific fitness level towards higher levels, this yields an upper bound on the 
expected running time. 

Theorem 2 (Fitness-level method). For two sets A,BC- {0, 1}" and a fitness function f 
let A <f B if f{a) < f{b)for all a & A and all b <E B. Partition the search space into non- 
empty sets Ai, A2, ■ ■ ■ : Ajn such that Ai <f A2 <f ■ ■ ■ <f Am and Am only contains global 
optima. For an elitist EA let Si be a lower bound on the probability of creating a new offspring 
in Ai+i U • • • U Am, provided the population contains a search point in Ai. Then the expected 
number of iterations of the algorithm to find the optimum is bounded by 



The fitness-level method has also been applied to other elitist optimization meth- 
ods, including elitist ant colony optimizers 1 12 , 27 1 and a binary particle swarm opti- 
mizer j37|. It gives rise to powerful tail inequalities |40| and it can be used to prove 
lower bounds as well, when combined with additional knowledge on transition prob- 
abilities |35|. Finally, Lehre |22| recently showed that the fitness-level method can be 
extended towards non-elitist EAs with additional mild conditions on transition proba- 
bilities and the population size. 

In the following we apply the fitness-level method to parallel EAs. For the consid- 
ered EAs we assume that there is a migration topology, given by a directed graph. Islands 
represent vertices of the topology and directed edges indicate neighborhoods between 
the islands. We often describe undirected graphs for use as migration topology, un- 
derstanding that for an undirected edge {it, v} we have two directed edges (m, v) and 
{v,u). In other words, though formally the migration topology is a directed graph, we 
often use the language of undirected graphs to describe it. 

Our methods for proving upper bounds require that the islands run elitist evo- 
lutionary algorithms. All islands create new offspring independently by mutation 
and / or recombination among individuals in the island. In every generation there is 
a chance that migration will send an individual on the current best fitness level to some 
target island, and that this individual will be included on the target island. This would 
effectively increase the fitness level of the target island to the current best level (or an 
even better one). For every pair of connected islands, we call this probability trans- 
mission probability and denote it p. Note that for any pair of islands, the mentioned 
transmission events are independent. 
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The transmission probability can model various settings, where randomness and 
stochasticity may be involved: 

• migrations do not take place in every generation, but only probabilistically with 
probability p, 

• islands do not automatically select individuals on the best fitness level for emigra- 
tion, but there is a probability of at least p that this happens, 

• similarly, islands do not automatically include immigrants on higher fitness levels, 
but only with probability at least p, 

• during migration crossover is performed, and p is a lower bound on the probability 
that crossover does not disrupt the fitness of an individual on a current best fitness 
level (if a crossover probability pc is used, then clearly p > 1 — Pc), 

• the physical architecture suffers from transient faults and p is a lower bound on 
the probability that migration is executed correctly. 

Of course, the transmission probability can also model any combination of the above, 
in which case the product of all above probabilities gives a lower bound on the trans- 
mission probability. 

Most of our results also apply when instead of probabilistic migration a fixed mi- 
gration interval r is used. This is similar to a migration probability p = 1/t; in fact, it 
can be regarded as a derandomized or quasi-random version of probabilistic migration. 
With a fixed migration interval the variance in the information propagation is reduced, 
and all islands operate in synchronicity. Probabilistic migrations are asynchronous; this 
simplifies the analysis as we do not need to keep track on how much time has passed 
since the last migration. We expect our results for probabilistic migration to transfer 
to the study of migration intervals. The only notable exception is the case of a com- 
plete topology, when the migration probability is rather small (Theorem [T2| as there 
synchronous and asynchronous migrations lead to different effects. 

As elaborated above, our method is robust and it applies in various settings, and 
for various types of EAs simulated on the islands. In our applications for illustrating 
concrete speedups for test problems, we use a simple (1+1) EA for all islands. The 
(1+1) EA maintains a single current search point, and in each generation it creates an 
offspring by mutation. The offspring replaces its parent if its fitness is not worse. The 
resulting island model is shown in Algorithm 111 



Algorithm 1: Parallel (1+1) EA with /i islands and migration probability p 

For all 1 < i < /I choose a;* G {0, 1}" uniformly at random, 
repeat 

For all 1 < i < /X do in parallel 

Create by flipping each bit in with probability 1/n. 
if f{y') > fix') then x' := y\ 

Send a copy of to each neighboring island, independently with prob. p. 
Choose with maximum fitness among all incoming migrants. 

if ./(^'O > fix") then x' := z\ 
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3 Proving Upper Bounds for Parallel EAs 
3.1 A General Upper Bound 

Now we describe how to prove upper bounds on the running time of parallel EAs. In 
contrast to panmictic EAs, in an island model several islands might participate in the 
search for improvements from the current-best fitness level. The number of islands 
may vary over time according to the spread of information. 

The following theorem transfers upper bounds for panmictic EAs derived by the 
fitness-level method into upper bounds for parallel EAs in a systematic way. 

Theorem 3 (Fitness-level method for parallel EAs). Consider a partition of the search space 
into fitness levels A\ </ A2 </ • • • </ A„i such that Am only contains global optima. Let 
Si be (a lower bound on) the probability that a fixed island running an elitist EA creates a new 
offspring in A^+i U • • • U A„i, provided the island contains a search point in Ai. Let for 
t e IN denote (a lower bound on) the number of islands that have discovered an individual in 
U • • • U Am in the t-th generation after the first island has found such an individual. Then 
the expected parallel running time of the parallel EA on f is bounded by 

m — 1 00 

Proof. Let Ti denote the random time until the first island finds an individual on a 
fitness level i + 1, . . . ,m, starting with at least one individual on fitness level i in the 
whole population. The expected parallel rimning time can be written as 

m— 1 m— 1 00 m— 1 oo 

E{T^n = E E (J^.) = E E (T, > = E E (t, > ^ + 1) . 

1=1 i=l t=l i=l t=0 

A necessary condition for Ti > i+1 is that during all t generations after the first individ- 
ual has reached fitness level i all islands are unsuccessful in finding an improvement. 
In the j-th of these generations there are at least fij islands, each being successful with 
probability at least s^. Using that the islands create new offspring independently, the 
probability of all islands being unsuccessful is at most (1 — Sj)^J . Thus, 

m— 1 00 rn—1 oo t 1 oo 

i=l t=0 4=1 i=0 j = l 1=1 t=0 

The upper bound from Theorem |3] is very general as it does not restrict the com- 
munication among the islands in any way. These aspects are hidden in the definition 
of the variables /it. When looking at one particular fitness level, say level i, we also 
speak of islands being informed if and only if they contain an individual on level i. The 
variable /i* then gives the number of informed islands t generations after the first island 
has been informed. 

The spread of information obviously depends on the migration topology, the mi- 
gration interval, and the selection strategies used to choose migrants that are sent and 
how migrants are included in the population. The basic method works for all choices of 
these design aspects. We elaborate on these aspects and then move on to more specific 
scenarios where we can obtain more concrete results. 
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3.2 How to Deal with Migration Intervals 

With a migration interval of t > 1 the /it -value remains fixed for periods of t gener- 
ations. For appropriate t then fit = fit+i = ■ ■ ■ — ^t+r-i- As the ^(-values are non- 



decreasing with t, the sum of /z-values is at least J2j=i t^j — '''J2j=i A^(j-i)T+i- This 
implies the following simplified upper bound. 

Corollary 1. For a parallel EA with migration interval r the bound from Theorem^simplifies 
to 

m— 1 OO 

E (TP") < ^(1 - s,)^5:*=iMo-i)x+i ^ 

1=1 t=0 

The values fj,(j^i)r can be estimated like the values fij in a setting with t — 1. In 
order to keep the presentation simple, in the following applications we only consider 
the case that r = 1, i. e., migration happens in every generation. This reflects com- 
mon principles used in fine-grained or cellular evolutionary algorithms. The following 
considerations can always be combined with the above arguments to handle migration 
intervals larger than 1 . 

3.3 Stochastic Communication and Finding Improvements 

In order to arrive at more concrete bounds on the parallel running time for common mi- 
gration topologies, we need to understand how the number of informed islands grows 
on each fitness level, i.e., the growth curves underlying the /ij -variables. Note that 
these variables are random variables in all settings where we have a transmission prob- 
ability less than 1. This means that getting a closed formula for the expected parallel 
rimnuig time is not easy. In Theorem |3] we cannot simply replace the /ij -variables by 
their expectations as by Jensen's inequality this would yield an estimation in the wrong 
direction (i.e., it would give a lower boimd where an upper boimd is needed). More 
work is required in order to arrive at closed formulas for common topologies. 

Instead of arguing with the random number of informed islands, it is easier to 
argue with expected hitting times for the time until a specified number of islands is 
informed. If we know such expected hitting times, or upper bounds thereof, we can 
estimate the time until the parallel EA finds a better fitness level. 

Lemma 1. Consider an island model running elitists EAs and fix some fitness level i with 
success probability Sifor each island. Let ^ (k) denote the random number of generations until 
at least k islands are informed. Then for every k < fj,the expected time until this fitness level is 
left towards a better one is at most 

/€ Si 

Proof. After E (fc)) expected generations there are at least fc informed islands. Then 
the probability of leaving the fitness level is at least 1 — (1 — s^)''' and the expected time 
is bounded by 

< 1 + ^ (1) 

1 - (1 ~ Si)*^ ~ fc Si ' 

where the inequality is due to Jon Rowe |29 Lemma 3], stated as Lemma |4] in the ap- 
pendix. Together, this proves the claim. □ 

A good choice for fc is one where E {£_ {k)) ^ ^ ■ j- as this is likely to minimize the 
bound from Lemma [l| at least asymptotically. 
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The lemma ignores the fact that during the first £,{{) k) generations islands can 
already find improvements. It also ignores that the number of islands might grow 
beyond k after this time. However, we will see that for appropriate choices of k, the 
lemma still gives near-optimal results. In the first generations the number of islands is 
likely to be too small anjrway to yield a significant benefit. In addition, after k islands 
have been informed this number is large enough to guarantee that improvements are 
fotmd quickly, for appropriate k. 

3.4 Information Propagation in Networks 

It remains to estimate the first hitting time for rnforming a certain number of vertices. 
Note that this is similar to studying growth curves and takeover times. In fact, ^(/i) 
is the expected time until the whole island model is informed. Growth curves and 
takeover times have been studied in artificial settings where no variation takes place, 
see ||3j|8||ll|[30||32) or recent surveys |23, Chapter 4], |34|. 

In the following, we refer to our model of transmission probabilities as it is a gen- 
eral model that captures many stochastic components in the dynamic behavior of island 
models. But at the same time it is simple enough to allow for a theoretical analysis. 

Transmission probabilities give rise to a stochastic information propagation pro- 
cess in networks. Each informed vertex in the network independently tries to inform 
all its neighbors in every iteration, and information is successfully transmitted across 
any of these edges with probability p. This process was studied by Rowe, Mitavskiy, 
and Cannings [28 1, who considered the propagation time as the time until all vertices in 
the network are informed. They presented bounds for interesting graph classes as well 
as a general upper bound of 

8 diani(G) + 8 log n 

for the propagation time on an undirected graph G. Thereby diani(G) denotes the di- 
ameter of G, defined as the maximum number of edges on any shortest path between 
two vertices in the graph. 

Interestingly, the same probabilistic process also underlies the way randomized 
search heuristics find shortest paths in weighted undirected graphs. Doerr, Happ, and 
Klein |5 6 1 showed that the (1+1) EA can find shortest paths in graphs by simulating the 
Bellman-Ford algorithm. The task is to find shortest paths from a source v* to all other 
vertices. For vertices whose shortest paths have few edges, shortest paths are found 
quickly. In our language these vertices would be called informed. If u is informed and 
the graph contains an edge { u, w}, then v can become informed with a fixed probability 
during a lucky mutation, if the shortest path from v* to v contains u. This way, shortest 
paths propagate through the graph in the same fashion as information does. The same 
can be observed for ant colony optimizers |36 1. 

Doerr, Happ, and Klein |5 1 independently used a different argument for bounding 
the expected propagation time. Fix a shortest path in the graph, leading from v* to some 
fixed vertex v. In every generation there is a chance of informing the first uninformed 
vertex on the path, until eventually the information reaches v. If the path has at least 
log n edges, the time until v is informed is highly concentrated. Using tail bounds, the 
probability of significantly exceeding the expectation is very small. This allows us to 
apply a union bound for all considered vertices v. 

Following the proof of |5 Lemma 3], we get the following lemma. An advantage 
over the general bound from |28J is that it not only bounds the propagation time for the 
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whole network. It also bounds expected hitting times for informing smaller numbers 
of vertices. 

Lemma 2. Consider propagation with transmission probability p on any undirected graph 
where initially a single vertex v* is informed. For i e Mo let Vi contain all vertices v whose 
shortest path from v* to v contains i edges. Let Sk ■— J2i=i '^^^ probability of not having 
informed Sk vertices in time \k/p, A > 2, is at most 

Sk ■ exp I — — ■ k j < Sk ■ exp I - 

The expected time until Sk vertices are informed is at most 

^ • niax{4fc, 81n(csfc)} 
P 

for every c > 1. 

Proof. The first claim follows from the proof of Lemma 3 in ||5] and the fact that (A — 
1)VA > A/4for A > 2. 

If 8/fc • In(csfc) > 2 we use A 8/fc • \n{csk) and have that after Xk/p iterations the 
probability of not having informed all vertices is at most 

Sk ■ exp{~\n{csk)) = -. 

c 

If not, we repeat the argument with another phase of Xk/p iterations. As each phase is 
successful with probability at least 1 — 1 /c, the expected propagation time is at most 

1 Xk _ ^ • 81n(csfe) 
1 — 1/c p p 

If 8/fc • In(csfc) < 2 then fc/4 > In(csfe). The first statement with A := 2 then gives a 
probability bound of 

Sfc • exp ( - 7 ] < Sk ■ exp (- In(csfe)) < ^ 



and using the same arguments as before we get a time bound of 

1 Afc _ ^ ■ 2fc 

1 — 1/c p p 

Note that putting k := diam(G) and c = 2, we get a bound of 

4diain(G) 161n(2n) \ ^ 4diani(G) + 11.21og(n) + 11.2 



□ 



p p 

For all non-empty graphs this is better than the general upper bound 

8 diam(G) + 8 log n _ 12.7 diam(G) + 12.7 log n 
p(l-e-i) p 

from Rowe, Mitavskiy, and Cannings p8) . However, the asymptotic behavior of both 
bounds is the same as (x+y) /2 < max{x, y} < x+y for all x.y £ Hq, hence max{x, y} = 
e{x + y). 

Now we are prepared to analyze parallel EAs with concrete topologies. 
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4 Parallel EAs with Ring Structures 

We start with ring graphs as they are often used as topologies p8| . Rings can either be 
unidirectional, in which case there is exactly one directed cycle, or bidirectional, when 
all edges are undirected. The following theorem holds for both kinds of graphs, and 
in fact for all strongly connected graphs. Recall that a directed graph is called strongly 
connected if for every two vertices u, v there is a directed from u to w (implying that there 
is also a path from v iou). 

Theorem 4. Consider an island model running elitists EAs on a function f with a fitness- 
level partition Ai </ • • • </ Am and success probabilities si, . . . , Sm-i- Lef p be (a lower 
bound on) the probability that a specific island on fitness level i informs a specific neighbor in 
the topology in one generation. The expected parallel running time on f with an unidirectional 
or bidirectional ring and ^ islands — or in fact any strongly connected topology — is bounded by 

„l/2 2^ 1/2 ^ ,, ' 2^ a. ' 

The expected communication effort for ring graphs is by a factor of at most 2p^ larger than the 
expected parallel time. 

The shape of this formula deserves some explanation. The second term ^ ■X]"=7^ 
is by a factor of j-i smaller than the upper bound for a single island by Theorem |2] If 
the latter is asymptotically tight, the second term in Theorem |4j regarded in isolation, 
would give a perfect linear speedup. The first term is related to the speed at which 
information is propagated through the island model. Unlike for the second term, it is 
independent of /i, but it depends on the transmission probability p. We do have a linear 
speedup if the first term Y^T=i^ asymptotically does not grow faster than the 

second term, again assuming that the bound for a single island is tight. 

As /i grows, the second term becomes smaller, while the first term remains fixed. 
So if we have a linear speedup for small /i, there is a point where with growing ji the 
linear speedup disappears. This threshold can be easily computed by checking which 
value of ji gives rise to the first and second terms being of equal asymptotic order. As 
will be seen in the next sections, the same also holds for other migration topologies. 

Proof of Theorem^ For the unidirectional ring we have E {f {k)) < (fc — l)/p since a 
new island is informed with probability at least p. As this happens independently 
in each generation, the expected waiting time until this happens is at most 1/p. In 
fact, this argument holds for all strongly connected topologies and in particular for the 
bidirectional ring. 

Now, if 1 < fc := p^/'^/s]''^ < (i 

gnoring rounding issues), by Lemma [l] the ex- 
pected number of generations on fitness level i is bounded by 

fc-l_^ll^ 1 1 _ 2 

p k - pi/251/2 pi/251/2 ^1/251/2' 

In case p^^'^/s^^ < 1 we trivially get an upper bound of 

1 1 

< 



Si p^l'^s'J'^ 
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Lemma [l] for fc := /i gives an upper bound of 

n-l 11 1 11 

+ 1 + > 77? + 

P fJ- Si p^l'^sl' M Si 

Taking the maximum of the above upper bounds gives 



max 




Summing over all fitness levels proves the claim. □ 

As remarked in the proof, the bound from Theorem |4]holds for arbitrary strongly 
connected topologies as the unidirectional ring is a worst case for the /i^-values. 

For bidirectional rings we have E (/c)) < ^ (as can be seen from applying Jo- 



hannsen's drift theorem, stated in the appendix as Theorem 14 to the difference to fc 
informed vertices, using h(\) = p and h(x) — 2p for a; > 1 as drift function). This 
decreases the constant 2 in the first term towards \/2, at the expense of an additional 
term m — 1. 

Also note that if p < then the trivial bound 1/si gives a better estimate for the 
time until this fitness level is left. If this holds for all fitness levels, parallelization does 
not give any provable speedups as information is propagated too slowly. 

Contrarily if, say, p = compared to a single island in a ring the expected 

waiting time for every fitness level can be replaced by its square root. This can yield 
significant speedups. We make this precise for concrete functions in the following the- 
orem. For comparing these times with runtime bounds for the (1+1) EA we refer to 
TableE] 

Theorem 5. The following holds for the parallel (1+1) EA with transmission probability at 
least pona unidirectional or bidirectional ring (or any other strongly connected topology): 

• E (TP-) = + ^) for OneMax, 

• E (TP'^') = of ^17^ + for every unimodal function with d + 1 function values, 



E {TP^'-) = o(|^^ + for ]ump^ with k>2. 



Proof. For OneMax we choose the canonical partition Ai := {x \ OneMax(x) = i}. 
The probability of increasing the current fitness from fitness level i is at least Si > 
{n — i) ■ l/{en) since there are n — i Hamming neighbors of larger fitness and a specific 
Hamming neighbor is created with probability at least 1/n • (1 — 1/n)"^^ > l/(en). The 
second sum in Theorem |4] is 

1 en en sj—y 1 q [ 

Lt ^ n — i Li ^ i \ U 

^ 1=0 ^ i=i ^ ^ 

The first sum in Theorem |4] is 

1 / 1 \ 1/2 / \ 1/2 ri 

- ' 1 \ „ / en ^ — 



2^1— .-I =2 

1=0 



en\ / 1 , . ^ I en^ 
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For unimodal functions we choose a partition Ai, . . . , Ad+i where Ai contains all search 
points with the i-th smallest function value. The probability of improving the fitness 
from level i is at least Si > 1/ (en) because there is at least one search point in the next 
fitness level which is at Hamming distance one. Theorem|4]gives an upper bound of 

E'^ ( en\^^'^ en , / en\^^'^ den „/ dn^^^ dn 

hr +ET^2d. - + — = oi 



=1 



P J frfM \P J M V P^'"^ M 



For Jump^. functions and i ^ {n — k, n} the fitness levels Ai are chosen similarly to 
OneMax, yielding the same terms in the upper bound as for OneMax. (In fact, results 
are even better as the hardest fitness levels for OneMax are replaced by easy fitness 
levels.) But to reach the highest level from n~k 1-bits, i. e., level An, a specific bit string 
with Hamming distance k has to be created. This has probability at least 

\n J \ n) V^/ V ^/ ^"^ 

Theorem|4]and the above bound for OneMax give 

^, n n\o%n\ f en^\^^'^ en'' ^/n'^/^ n''\ 
O ^ + ^ +2 + =0^ + —\. □ 



p^/^ H J \ P J A* \p^/^ 

The speedups obtained are indeed significant, particularly for those functions 
where improvements are hard to find. 

The proof of Theorem |5] uses well-known fitness-level partitions |j7 39 1, and hence 
it simply consists of plugging in known values Si and simplifying. This shows how 
easy it is to obtain results for parallel EAs based on analyses of panmictic EAs. 

By a strange coincidence, the speedups obtained through parallelization on ring 
graphs are as large as those obtained through quantum search |16j . 

5 Parallel EAs with Two-Dimensional Grids and Tori 

For two-dimensional grids and tori we adapt Theorem |3] in a similar manner, making 
an effort to get the best possible leading constant in the first term of the running time 
bound. We also consider applications of the resulting theorem similar to the applica- 
tions for ring graphs. 

Theorem 6. Consider the setting from Theorem^ The expected parallel running time of the 
island model on a grid or torus topology with side lengths ^ x yfji is bounded by 

p"^ h ^ ^'h'^' 

The expected communication effort is by a factor of at most Ap^ larger than the expected parallel 
time. 

Proof. Note that within a square area of Vk x ^/k vertices in the graph all shortest 
paths between any two vertices have at most — 2 edges. Applying Lemma|2]with 
k' :— 2\fk — 2, Sfc/ > k and c = 4 we have that for every k < ii the expected time until k 
islands are informed is bounded by 



8/3 -Vl?- 8/3 32/3-hi(4fc)l 
P ' P j 
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We also get an upper bound of fc/ (2p) using Johannsen's variable drift theorem | [15) , 
Theorem 14 in the appendix, as follows. If there is more than one uninformed vertex, 
there are always at least two vertices neighboring to informed ones. So the expected 
number of informed vertices increases by 2p in expectation. Applying Johannsen's drift 
theorem as for the bidirectional ring gives an upper bound of k/{2p). It is easy to check 
that the best upper bound is as follows: for all fc e IN 

. f fc f 8/3 • Vfc - 8/3 32/3 • ln(4fc) 11 6%/fc - 1 

\2p' I p ' p i ) ~ P 

Now, if 1 < fc 3^^/'^ • (p/si)^/3 < fi (ignoring rounding issues) by Lemma [l] the 
expected number of generations on fitness level i is bounded by 

e^fc-i 11 6-3-1/3 32/3 35/3 

! + -.-< 



If 3~2/3 • {p/ SiY^'^ < 1 we trivially get an upper bound of 



1 32/3 
< 



p'^/3s|/3 

If 3-2/3 . p2/3^g2/3 ^ fc := ^ an upper bound of 

- 1 11 2 • 32/3 1 1 
+ 1 + < 7-^ + . 

P M Si p2/3sV3 ^ Si 

Taking the maximum of the above upper bounds gives 

/ 35/3 2-32/3 1 l\ 35/3 1 1 

max — , — H — • — < -J- H — • — . 

\p2/35i/3 P2/35I/3 ^ s^j pi/^s]' fJ- s< 

Summing over all fitness levels yields the claim. □ 

Note that the commimication effort in one generation is asymptotically as large as 
for ring graphs, but for large p the parallel running time is generally smaller. If p < Ss^ 
then again the trivial upper bound 1/si is better as then the spread of information is 
too slow. 

Compared to a single island, in a torus the expected waiting time for every fit- 
ness level can be replaced by its third root. This leads to improved upper bounds for 
tinimodal functions and Jump^,. 

Theorem 7. The following holds for the parallel (1+1) EA with transmission probability p on 
a grid or torus topology and side lengths at least y/JL x y/JI: 

• E (TP-) = 0(^ + ^) for OneMax, 



• E (TP'^') — 0\^ '^^2/3 + ^ j foT every unimodal function with d + \ function values, 

• E {TP''') = 0(^ "+"/3^' + ^) /or Jump^^ with fc > 2. 
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Proof. We choose the same partitions as in the proof of Theorem|5] Note that the second 
terms in Theorem |4] and |6] are identical, so we only estimate the first terms and refer to 
Theorem|5]for the second terms. 

For OneMax the first sum in Theorem |6] is 



„2/3 I _ j y ~ „2/3 I J 

^ 35/3el/3nl/3 rn /^X 1/3 



35/3gi/3,,i/3 3 ^^^^ ^ 38/3/2. el/3, 



This gives an upper boimd of 



O 



p2/3 2 p2/3 



n n log n 



p2/3 ^ 



For unimodal functions Theorem |6] gives 

— V^' < 35/3d-ei/3nV3 rfen _^f dn^'^ , 

92/3 2-^ B / u ~ d2/3 u V »2/3 

For Jump J, we get 



0\ ^ttt; ^ ^ +3^/'^--^^ ^4t^ \ = ^7T^ h . □ 



p2/3 J p2/3 ^ \^ p2/3 ^ 

6 Parallel EAs with Hypercube Graphs 

Hypercube graphs are popular topologies in parallel computation. In a d-dimensional 
hypercube each vertex has a label of d bits. Two vertices are neighboring if and only if 
their labels differ in exactly one bit. The number of vertices is then 2^*, and each vertex 
has d neighbors. The diameter of a d-dimensional hypercube is d, hence only logarith- 
mic in the size of the graph. The small diameter implies that in many communication 
models information is spread rapidly, even though the degree of vertices is quite small. 
With regard to the propagation process investigated here, we get a small first term in 
the following running time boimd, and still have a very moderate commimication ef- 
fort. 

Theorem 8. Consider the setting from Theorem^ The expected parallel running time of the 
island model on a (log fj)-dimensional hypercube graph with fi islands is bounded by 



49m + 24E™7Mog(i) 1 



ni — 1 ^ 

2 — 1 



The expected communication effort is by a factor of at most pfi log fi larger than the expected 
parallel time. 

Proof. In the notation of Lemma|2]we have for the hypercube and 1 < fc < log 11 
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Invoking Theorem |2] with c = 2, the expected time until 2^ vertices are informed is 
therefore at most 

16hi(2-2'=) _ • (fc + 1) 24(fc + l) 

V P P ' 

By Lemma |T| the expected time on fitness level t is hence bounded, for any integer 
< < log /i, by 

24(fc + 1) 1 1 25 24fc 1 1 

p 2'^ Si p p 2'^ Si 

If p/(24si) < 1, we get a trivial upper bound of l/s, < 24/p. If p/(24si) > ^, which 
implies d < \og{p/ (24si)) < log(l/si), we get an upper bound of 

25 + 24d 1 1 25 + 241og(f) i i 

+ < ^ + . 

P A* s,j p n Si 

Otherwise, l|2| is minimized for 2'^ = p/ (24si), leading to 

25 ^ 241og(^) ^ 24 ^ 49 ^ 241og(^) 
p p p ~ p p 

The maximum over all these bounds is at most 

49 + 241og(j,) 1 ^ 

P iJ- Si' 

Summing over all fitness levels yields the claim. □ 

Results for our example applications are as follows. 

Theorem 9. The following holds for the parallel (1+1) EA with transmission probability p on 
a (log fifdimensional hypercube: 

• E (TP^' ) - + for OneMax, 

• E (TP**') = Q^^ dXogn ^ dji^ y-pj, g^gj-y uniiuodal function with d + \ function values, 

• E (TP'^' ) = o( "+''p'°s" + ^) /or Jump^ w;ft/! k>2. 



Proof. For OneMax we have 

iUj:,„g(f) .log n 



en 
i 



log ( — ) < log ( ^ ) = log (e-) = 2n log (e) 



Theorem |8] gives an upper bound of 



49n + 48n log e „ / n log ?i \ „ / n n log n 
—+0{ —\=0' 



fi J \p fi 
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For unimodal functions Theorem |8] gives 



49d + 24(ilog(en) ^ q^^'^^ - o( ^^"S"" -|- 



For Jumpj, we get 



^/n nloffn\ 49 + 24fc loe(en) ^ 
O - H ^ H — ^ + O 



P fJ- 
n + k log n n 



= 0[ — — + — . □ 

VP / 

If p = we get linear speedups for OneMax if /i = O(logn), and linear 

speedups for unimodal functions where the bound 0{dn) for a single island is tight, 
if /i = 0{n/ \ogn). For Jump^,, if fc = 0(n/ log n) we can choose /i = 0{n''^^) to get 
a linear speedup. As can be seen from Table [T| the expected parallel times for LO and 
Jump J, are much better for the hypercube than for rings and torus graphs, if p is large. 

7 Parallel EAs with Complete Topologies 

Finally, we consider the densest topology, the complete graph X^, where every island 
is neighboring to every other island. The complete graph is interesting because it rep- 
resents an extreme case: the largest possible communication costs, but also the fastest 
possible spread of information. 

For the special case of p = la parallel (1+1) EA is basically equivalent to a 
(l+/i) EA, which creates yu offspring independently and then compares a best offspring 
against the current search point. The only difference is that the parallel (1+1) EA can 
store different individuals of the same fitness. But this issue is irrelevant when using 
the fitness-level method. Hence our results for a parallel (1+1) EA with a complete 
topology and p = 1 also apply for the (l+/i) EA. For p < 1 the two models are generally 
different. 

We start with a simple argument. Clearly, if there is at least one informed island, 
each other island will become informed with probability at least p. 

Theorem 10. Consider the setting from Theorem^ The expected parallel running time of the 
island model on a complete topology is 

m— 1 



„/„,^arx 2?71 2 \ ^ 1 



P — 

The expected communication effort is by a factor of at most pfi'^ larger than the expected parallel 
time. 

Proof. We estimate the expected time until at least /i/2 islands are informed after an 
improvement. If more than /i/2 islands are uninformed, the expected number of islands 
that become informed in one generation is at least pfi/2. By standard drift analysis 
arguments fls'l the desired expectation is bounded by 2/p. 

By Lemma [T] we then get that the expected time on fitness level i is at most 

2 2 1 
1 + - + . 

p H Si 

Adding these times for all fitness levels proves the claim. □ 
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As mentioned, the complete graph leads to a maximal spread of information. In 
comparison to the previous sections, we obtain the best upper bounds for the con- 
sidered function classes. However, also the communication effort in one generation is 
maximal, so the expected total communication costs are also highest (cf. Tables[l]and|2j. 

Theorem 11. Let ^ e IN. The following holds for the expected parallel running time of the 
parallel (1+1) EA with topology K^. In the case p — I, the same holds for the (1+^) EA: 

• E (TP^' ) = C"!^^ + ^^^) for OneMax, 

+ — I for every unimodal function with d + I function values, and 

• E (TP^') = o(^'^ + '-^^for Jump^ with k>2. 
The proof is obvious by now. 

The term 2/p for the time until at least fj,/2 islands are informed is a reasonable 
estimate if p is large (e. g., p = But for small p this estimation is quite loose as 

we have completely neglected that all informed vertices have a chance to inform other 
islands. 

We therefore also present a more detailed analysis for small p. The motivation for 
studying complete graphs and small p is that it captures random migration policies. 
Assume that each island decides randomly with probability p for each other island 
whether to migrate individuals to that island. Then this can be regarded as a complete 
topology with transmission probability p. 

Values around p — 1/ fi seem particularly interesting as then in each generation 
one migration takes place for each island in expectation. There also a change of regime 
happens as we get different results for p > and p < 1 //i. 

Lemma 3. Consider propagation with transmission probability p on the complete topology with 
jjL vertices. Let ^ (k) be as in Lemma^ then 



min(p/i, 1) 

Proof. Let Xt denote the random number of informed vertices after t iterations. We 
first estimate the expected time until at least /i/2 vertices become informed, and then 
estimate how long it takes to get from fj,/2 informed vertices to pi informed ones. 

If Xt = i each presently uninformed vertex is being informed in one iteration with 
probability (using Lemma |4j 

l-il-py>l- = =: tq. 

I + ip I + ip 

This holds independently from other presently uninformed vertices. In fact, the num- 
ber of newly informed vertices follows a binomial distribution with parameters /i — i 
and iq. The median of this binomial distribution is — i)q (assuming that this is an 
integer), hence with probability at least 1/2 we have at least i(/i — i)q newly informed 
vertices in one iteration. Hence, it takes an expected number of at most 2 iterations to 
increase the number of informed vertices by i(/i — i) • jfj^, which for i < /i/2 is at least 

2+ptJ.- 

For every < j < log(/i) — 2 the following holds, li i > 2^ then in an expected 
number of 2 generations at least 2^ • new vertices are informed. The expected 
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number of iterations for informing a total of 2^ new vertices is therefore at most 2 • . 
Then we have gone from at least 2^ informed vertices to at least 2-'+^ informed vertices. 
Summing up all times across all j, the expected time until at least 2^°^^^^^^ = /j,/2 
vertices are informed is at most 



2{\og{p) - 1) 



For pfi < I we have < 3/ [p^j), yielding an upper time bound of 6(log(^) — 1)/ {p^J)■ 
Otherwise, we use ^^^^ < 3 to get a bound of 6(log(^) — 1). 

For the time to get from to ji informed vertices, observe that the expected 
number of newly informed vertices is ^^f_|7/p^ , if currently i vertices are informed. This 
function is monotone decreasing if i > ///2. Applying Johannsen's drift theorem. The- 
orem 14 for the number of uninformed nodes, using the above as drift, gives an upper 



bound of 



1 + - 1)P 



< 



< 



m/2 



I + ip 



di 



< 



(a* - i)p Ji - 
1 + (a^ - i)p ^ ln{^i - + p^i) 
- i)p Pfj. 

l+PM I 1 I ln{fi-l){l+pn) 
pfi ^(/i - l)p Pfi 

iHii) + i)ii+pti) 



1 



PH 



For pfj, < 1 this is at most 



2lnip) 



1 



Pfi 



< 



21n(^) +5/2 
pji 



Otherwise, this is at most 



(ln(M) 



1) ■ 2p/i 
PIJ, 



< 21n(/Lt) + 5/2. 



Together, along with lii(^) < log(^) this proves the claim. 



□ 



Combining Lemma |3] with Lemma [T] gives the following. Apart from an additive 
term m, the case of p < l/A* yields a bound where the first term is smaller by a factor of 
order log(/i) / /i. For fairly large transmission probabilities, p > 1/^, in the first term we 
have replaced the factor 1/p by log(/i). These improvements reflect that the complete 
graph can spread information much more quickly than previously estimated in the 
proof of Theorem [To] 

Theorem 12. Consider the setting from Theorem^ The expected parallel running time of the 
island model on a complete topology is bounded as follows. Ifp > l//i we have 



EiTP""') < m + 8m\ogn 



1 1 

-E- 
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and ifp < 1/fiwe have 



m— 1 



For our example applications, the refinements in Theorem 12 result in the follow- 
ing refined bounds. As we only get improvements for p = log(/i)), we do not 
mention the special case of the (l+fJ.) EA with p = I. 

Theorem 13. Let ^ e IN. The following holds for the expected parallel running time of the 
parallel (1+1) EA with topology K^: 

• E (TP^' ) = O [ii log(/i) + for OneMax ifp>l/^i and 
E (TP^') = o(ri+ + otherwise, 

log(^) + for unimodal functions with d + 1 values, ifp > I/a*/ ^nd 
E (TP'^') ^0{d+ ^^^^^ + otherwise, and 

• E {TP^') = O^n log(^) + for ]ump^ with k>2, ifp > l/^i and 
E {TP^') = 0(n + ^i^^ + ^1^) otherwise. 

8 Experiments 

In order to complement the analytical results above, we also give experimental results 
on the behavior of island models for different topologies. As a detailed experimental 
evaluation is beyond the scope of this paper, we only present illustrative results for the 
two functions OneMax and LO. 

First we investigate the parallel running time TP^' for different transmission prob- 
abilities. The experiments were repeated 100 times per data point for the parallel 
(1+1) EA with ^ = 64 islands and an instance size of n = 256 for all example functions, 
varying the transmission probability p in steps of 0.01. Figure [T|shows the behavior for 
the topologies if 64/ a bidirectional ring graph, an 8 x 8 torus graph, and a 6-dimensional 
hypercube. 

Looking at the influence of the transmission probability on the running time, a 
higher transmission probability improves the running time behavior of the algorithm, 
also according to the expectations from our theoretical analysis. In particular, all not 
too small p lead to much smaller running times compared to the setting p = 0, i. e., /i 
independent runs of the (1+1) EA. This demonstrates for our functions that paralleliza- 
tion and migration can lead to drastic speedups. For larger or intermediate values for 
p the parallel running time does not vary much, as then for all topologies the running 
time is dominated by the second terms from our bounds: l//i-0(?i log n) and l/fj,-0{n^ ) 
for OneMax and LO, respectively. 

Comparing the behavior of those topologies, we see that the parallel running time 
indeed depends on the density of the topology, i.e., more dense topologies spread 
information more efficiently, which results in a faster convergence. As expected, the 
topology Kf^i performs best, the ring graph performs worst. 

Next we investigate the impact of the number of islands on performance, with 
regard to different topologies and transmission probabilities, see Figures [2(a) and 2(b) 
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Figure 1: Average parallel running time for the parallel (1+1) EA using n — 64 islands 
and different transmission probabilities, both for OneMax and LO on n = 256 bits. 



for a transmission probability p = 1.0 and Figures 2(c) and 2(d) for a transmission 
probability p — 0.1. As the parallel running time shows a steep decrease, we plot the 
efficiency instead, defined as 



^scq 



It can be regarded as a normalized version of speedup, normalized by the number 
of islands. Small efficiencies indicate small speedups, large efficiencies indicate good 
speedups. An efficiency of 1 corresponds to a perfect linear speedup. 

Again, the instance size of the benchmark functions was set to rt = 256 and the 
number of islands /i was chosen from 1 to 64. Only square torus graphs were used. So 
our torus graphs and hypercubes are only defined for square numbers and powers of 2, 
respectively, leading to fewer data points. For lower numbers of islands the efficiency of 
the algorithm is better than for larger numbers of islands. This is somewhat expected 
as a single (1+1) EA, i. e., our setting with fi ^ 1, minimizes the number of function 
evaluations for both OneMax and LO |35|, among all EAs that only use standard bit 
mutation. This excludes superlrnear speedups on OneMax and LO, for such EAs. 

It can be seen that more dense topologies are more efficient than sparse topologies. 
Also, the efficiency is decreasing with a higher number of islands. In accordance with 
our theoretical analyses, the efficiency decreases more rapidly for OneMax. For One- 
Max, and p = ri(l), only values /i = 0(log n) were guaranteed to give a linear speedup. 
And indeed the efficiency in Figure [2(c)| degrades quite quickly for OneMax and p = 1- 

Higher numbers of islands are still efficient for LO. For t he ring, the range of good 
/i -values is up to fi — 0{y/n). This is reflected in Figure 2(d) as the efficiency degrades 
as /i increases beyond ^/r^ = 16. For denser topologies the efficiency only degrades 
for large /i. The complete graph remains effective throughout the whole scale — even 
stronger, for values wp to ii — 256 (not shown in Figure |2| the efficiency was always 
above 0.75. This was also expected as fi — Q{n) still guarantees a linear speedup for 
LO. 
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Figure 2: Efficiency for the parallel (1+1) EA with transmission probabilities p e {0.1, 1} 
for /i e {1, . . . , 64} numbers of islands. 



Comparing the rvmning time behavior for different transmission probabilities, the 
plots confirm again that in our examples a higher transmission probability for individ- 
uals allows for a better overall performance. 

9 Conclusions 

We have provided a new method for the running time analysis of parallel evolution- 
ary algorithms, including applications to a set of well-known and illustrative example 
functions. Our method provides a way of automatically transforming running time 
bounds obtained for panmictic EAs to parallel EAs with spatial structures. In addition 
to a general result, we have provided methods tailored towards specific topologies: 
ring graphs, torus graphs, hypercubes and complete graphs. The latter also covers off- 
spring populations and random migration topologies as special cases. Our results can 
estimate the expected parallel running time, and hence the speedup obtained through 
parallelization. They also bound the expected total communication effort in terms of 
the total number of individuals migrated. 

Our example applications revealed insights which are remarkable in their own 
right, see Table [T] and a more general version in Table |2] Compared to upper boimds 
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obtained for a single panmictic island by the fitness-level method, for ring graphs the 
expected waiting time for an improvement can be replaced by its square root in the 
parallel running time, provided the number of islands is large enough and improve- 
ments are transmitted efficiently, i. e., p = ri(l). This leads to a speedup of order logn 
for OneMax and of order ^Jn for unimodal functions like LO. On Jump^, the speedup 
is even of order n^l"^ . A similar effect is observed for torus graphs where the expected 
waiting time can be replaced by its third root. The h5rpercube reduces the expected 
waiting time on each level to its logarithm, and on the complete graph it is reduced 
to a constant, again provided there are sufficiently many islands. This way, even on 
functions like LO and Jump^, (3 < fc = 0(n/ logn)) the expected parallel time can be 
reduced to 0(n). In all these results the population size can be chosen in such a way 
that the total number of function evaluations does not increase, in an asymptotic sense. 
The "optimal" population sizes have been stated explicitly (cf . Tables[T|and|2j, therefore 
giving hints on how to parametrize parallel EAs. 

The tables also reveal that in certain situations there is a tradeoff between the ex- 
pected parallel time and the communication effort. For instance, on LO the torus graph 
has the smallest communication effort of 0(r?) at the expense of a higher parallel time 
bound of 0{n^l^\ The complete graph has the smallest bound for the parallel time, 
0(n), but the largest communication costs: Oir?). The hypercube provides a good 
compromise, combining the smallest bounds up to polylogarithmic factors. A similar 
observation can be made for Jump^,, but there the hypercube is the better choice than 
the complete graph (strictly better in terms of communication costs and equally good 
in the parallel time bound). In all our examples the ring never performed better than 
torus graphs in both objectives. 

Future work should deal with lower bounds on the running time of parallel evo- 
lutionary algorithms. Also in our example functions no diversity was needed. Further 
studies are needed in order to better understand how the topology and the parame- 
ters of migration affect diversity, and how diversity helps for optimizing more difficult, 
multimodal problems. 
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A Appendix 

The following inequality was brought to our attention by Jon Rowe. A proof is found 
in 1 29 Lemma 3]. 

Lemma 4. For any < a; < 1, and any n> 

1 



(l-^)"<^ 

We also state Johannsen's variable drift theorem |15|, in a version with slightly 
improved conditions ||29). 



Theorem 14 (Johannsen's Variable Drift Theorem [15 29|). Consider a stochastic process 
{X}t>a on {0, 1, . . . ,to}, with m € IN. Suppose there is a monotonic increasing function 
h : M"*" — >■ M"*" such that the function l/h{x) is integrable on {1, ... , m}, and with 

E (Xt ~ Xt+x \Xt^U)> h{k) 

for all k <E {1, . . . , m}. Then the expected first hitting time of state is at most 

1 r 1 , 

ax. 



h{l) h{x) 
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Table 2: Asymptotic bounds on expected parallel (Tp^"^, number of generations) and sequential (^T^'^'^, number of function evaluations) 
running times and expected communication efforts (T^°"^, total number of migrated individuals) for various n-bit functions and island 
models with /i islands running the (1+1) EA and using migration probability p. The number of islands /i was always chosen to give the 
best possible upper bound on the parallel running time, while not increasing the upper bound on the sequential running time by more 
than a constant factor. For unimodal functions d+1 denotes the number of function values. See |[7] for bounds for the (1+1) EA. Results 
for Jump J. were restricted to 3 < fc = 0{n/ log n) for simplicity. 



